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KAC’S WALK ON THE n-SPHERE MIXES IN n\ogn STEPS 


NATESH S. PILL Alt AND AARON SMITHS 


Abstract. Determining the mixing time of Kac’s random walk on the sphere is a 
long-standing open problem. We show that the total variation mixing time of Kac’s walk 
on is between ^n\og{n) and 200nlog(n) for all n sufficiently large. Our bound is thus 
optimal up to a constant factor, improving on the best-known upper bound of 0(n® log(n)^) 
due to Jiang [Jial2]. Our main tool is a ‘non-Markovian’ coupling recently introduced by 
the second author in [Smil4] for obtaining the convergence rates of certain high dimensional 
Gibbs samplers in continuous state spaces. 


1. Introduction 


In his 1954 paper [Kac54], Marc Kac introduced a random walk on a sphere as a model 
for a Boltzmann gas. Kac’s walk on the {n — l)-sphere, 

n 

= {A e M” : 

i=l 

is a discrete-time Markov chain {Xt G that evolves as follows. At every step t, 

choose two coordinates 1 < it < jt ^ n and an angle 6t G [0, 27r) uniformly at random and 
set 


V J 

Xt+ilk] 


cos(dt) -sm(0t) 1 / Xt[it] \ 

_ sin(0t) cos(0t) ] V ) ’ 
Xt[k], k^{it,jt}. 


( 1 . 1 ) 


Let F : {1, 2,..., n} X {1, 2,..., n} X [0, 27r) x i-A be the map associated with this 
representation, so that A^+i = F{it,jt,9t,Xt). Physically, Kac motivated this random walk 
by considering the velocities of n particles in a one-dimensional box. He assumed that these 
particles were uniformly distributed in space, and the random walk Xt models the change 
in their velocities over time as collisions occur. The condition that Xt be constrained to 
a sphere corresponds to the principle of conservation of energy. Understanding the mixing 
properties of this process is central to Kac’s program in kinetic theory. The article [MM13] 
gives a useful description of this program. 

To state the main result of this paper, we recall some standard dehnitions. For measures 
Ui, U 2 on a measure space (H, A), the total variation distance between z/i, z /2 is given by 


- ^ 2 \\tv = sup(i^i(A) - ZZ2(A)). 
AeT 
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We denote the distribution of a random variable Z by CiZ') and write Z ~ i/ as a shorthand 
for £(Z) = V. 

Let /i denote the normalized Haar measure on The Markov chain {W}f>o has as 

its unique stationary distribution on The mixing profile of Xt is defined as 

r(e) = minjf : sup ||>C(W) - /i||Tv < e} 

and the mixing time is given by Tmix = t(0.25). 

Our main result is: 

Theorem 1. Fix Oi < | and C 2 > 200. If the sequence of times {Ti{n)}n^^ satisfies 
Ti{n) < C'inlog(n) for all n, then 

lim inf ||/:(Xti(„)) - /i||TV = 1- (1-2) 

n^cxDXoSS"-! 

If the sequence of times {T 2 (n)}„gj^ satisfies T 2 {n) > C 2 n\og{n) for all n, then 

lim sup ||£(Xt 2 („)) -/i||TV = 0. (1.3) 

Theorem 1 implies that, for any 5 > 0 and n > Nq{6) sufficiently large, the mixing time 
Tmix of Kac’s walk on satishes 

(1 - 6)^n\og{n) < Tmix < (1 + 5) 200?7,log(n). 

Theorem 1 also establishes the stronger result that Kac’s walk exhibits pre-cutoff with 
window [|?7,log(n), 200nlog(n)] (see e.g., chapter 18 of [LPW09] for an introduction to cut¬ 
off). We conjecture that Kac’s walk also exhibits cutoff at time 2nlog(n). While we have 
made no effort to optimize the size of the cutoff window obtained in this paper and both 
our upper and lower bounds for Tmix could be easily improved with our methods, a proof of 
cut-off does not seem to follow from our methods and will likely require new ideas. 

1.1. Relationship to Prior Work. There is an extensive literature on the mixing time of 
Kac’s walk under various measures. In the sequence of papers [JanOl, CCL03, Mas03, Cap08], 
the spectrum of Kac’s walk on S”' was bounded and then computed. Although these results 
imply a convergence rate for Kac’s walk in L^, and a bound on the distance to stationarity 
in implies a bound on the total variance distance to stationarity, these bounds do not 
imply any bound at all on the total variation mixing time of Kac’s walk. This is because, 
when C{Xf) is concentrated at a point, the initial distance to stationarity is not finite. 
Stronger entropy bounds [CCR^08] also give no bound on the total variation mixing of Kac’s 
walk. The previous best known bound on the mixing time of Kac’s walk on S"", due to Jiang 
[Jial2], is 0(n® log(n)^). 

There are other versions of Kac’s walk; see the literature review in [Oh09] for an overview of 
one of the most-studied such walks. The recent works [MM13, HM14] discuss the relationship 
of the study of Kac’s walk to the original goals of Kac’s paper [Kac54]. 

2. Technical Overview 

We bound the mixing time of a Markov chain by applying the standard coupling lemma 
(see e.g., Theorem 5.2 of [LPW09]): 
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Lemma 2.1. Let K be the transition kernel of a Markov chain with unique stationary distri¬ 
bution V on state space Vt. Let {Xt\t>Q, be two Markov chains started at Xq = x E Ll 

and Yq ^ v. Define the coalescence time 

t{x) = min{t : Xt = Yt}. (2.1) 

Assume that the coupling of {Xt}t>Q, {Yt}t>o satisfies Xt = Yt for all t > t{x). Then for 
any t > 0, 

\\C{Xt) - z/||tv < P[r(x) > t]. 

The skeleton of the coupling argument in our paper is quite similar to that in [Smil4]. As 
in that paper, we construct a coupling of two copies of Kac’s walk over two phases. In the 
hrst phase, we construct a ‘proportionate coupling’ (see Dehnition 3.1) between two copies 
of Kac’s walk. We show that in this hrst phase, the two copies become close to each other 
in the Euclidean norm in 0(nlog(n)) time. In the second phase, also of length 0{nlog{n)), 
we construct a non-Markovian coupling that allows two copies of Kac’s walk that start very 
close to each other to actually coalesce. We then apply Lemma 2.1, bounding the probability 
P[r(a;) > T] that the random coalescence time t{x) is larger than a particular deterministic 
time T 200n log(n). 

This ‘two-step’ approach, with an initial ‘contracting’ phase followed by a second ‘coalesc¬ 
ing’ phase, is a popular approach for proving the convergence of Metropolis-Hastings chains 
on continuous state spaces (see e.g., [RR02, MSIO]). However, there are some difficulties in 
extending this approach to the study of high-dimensional Gibbs samplers such as Kac’s walk. 
For most Metropolis-Hastings chains, two nearby Markov chains can coalesce in a single step, 
and so the coalescence phase can be of length 1. Gibbs samplers do not share this property: 
in n dimensions, arbitrarily nearby Markov chains generally have 0 probability of coalescing 
in less than n steps. This means that the coalescence phase is quite lengthy, and it becomes 
necessary to ensure that the two chains do not drift too far apart during this period. We 
also mention that the presence of constraints on the state space (in this case, the constraint 
that Xt must be an element of 8”'“^ and cannot be an arbitrary element of M") presents ad¬ 
ditional complications that are not present for most Metropolis-Hastings chains. In our case, 
this constraint guarantees that any Markovian coupling scheme in the coalescence phase will 
have E[r(x)] > C for some C* > 0. The same phenomenon occurs in [Smil4] and many 
well-studied discrete chains (see e.g., Lemma 8 of [Borll]). See [HV03, HV07] for earlier 
examples of non-Markovian couplings applied to the study of Markov chains on discrete state 
spaces and [GJ08, KenlS] for more general discussion of when a ‘good’ Markovian coupling 
exists. 

2.1. Structure of the Paper. Section 3 describes phase 1 of the coupling. Section 4 de¬ 
scribes phase 2, and the proof of Theorem 1 is given in Section 5. 

2.2. Informal Discussion of ‘Greedy’ Couplings. Before giving a formal description of 
the second phase of our coupling in Section 4.1, we briefly explain where it comes from. 

The obvious ‘greedy’ approach to coupling a pair of Gibbs samplers {Wt, Zt\t>o in E" is to 
run a Markovian coupling that matches as many coordinates as possible at each step. That 
is, we try to grow the set Et = {1 < i < n : Wfii] = Zfii]} by as much as possible at every 
step. Such a greedy coupling always exists, but it is easiest to analyze if Et C Et+i for all t 
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- that is, if coordinates that agree once will agree forever. To phrase this second condition 
in a slightly unfamiliar way, this means that the sequence of hyperplanes 

Ht = {v : yt e Et, v[i] = 0} (2.2) 

should satisfy 

Ht+i C Ht (2-3) 

{Wt - Z,) G H, 

for alH > 0. When this occurs, the coalescence time t{x) satishes 

t{x) = inf{f > 0 : = {0}}. (2.4) 

We claim that no Markovian coupling of the square {X^, of Kac’s walk can satisfy 

Inequality (2.3). Indeed, for two copies of Kac’s walk Xt,Yt, the dynamics often force 
X^_^_-^ — to ‘pop out’ out of a candidate Ht- However, our notation suggests a much 

larger family of ‘greedy’ couplings: any sequence that satisfies Inequalities (2.3) and 

(2.4) gives rise to a ‘greedy’ coupling of {Z^, Wt}t>o, even if the hyperplanes aren’t of the 
form (2.2). 

There are many candidates for the sequence {Ht}t>o- Roughly speaking, in this paper we 
construct the simplest possible sequence {Ht\t>o that is not of the form (2.2): rather than 
starting with Hq = M"" and greedily reducing the dimension of our candidate Ht as quickly 
as possible until it reaches {0}, which gives hyperplanes of the form (2.2), we start with 
Ht = {0} at some fixed time T > 0, and then go back in time while greedily increasing 
the dimension of our candidate Ht as quickly as possible until it reaches M”. This sequence 
of candidate hyperplanes, which are described carefully in Section 4.1 (Equations (4.1) and 
(4.2) define the sequence of hyperplanes used in this paper) turn out to satisfy (2.3) and 
(2.4) with high probability for Kac’s walk. Because of this, the resulting non-Markovian 
coupling is almost as straightforward to analyze as the standard greedy Markovian coupling. 

Our construction for {Ht}t>o will not satisfy (2.3) with high probability for all Markov 
chains, just as the usual construction in (2.2) did not. Nonetheless, we hope that this general 
approach may yield other sophisticated couplings that are straightforward to analyze. We 
also point out that, when coupling Gibbs samplers, it is very common to force the two 
Markov chains to always select the same sequence of coordinates at every step^, and we 
make this choice in the present paper. In this situation, the above coupling constructions 
would be applied to the Markov chains that are given after conditioning on the sequence of 
coordinates to be updated. This modification makes no difference to the remainder of the 
discussion. 


3. Contraction Estimates 

In this section, we couple two copies of the Kac’s walk on and obtain a one-step 
contraction for the distance between these two copies in a suitable psuedo-metric. This is 
achieved via the following coupling. 

Definition 3.1 (Proportional Coupling). We define a coupling of two copies {Xt\t>Q, {h)}t>o 
of Kac’s walk for a single step. Fix Xo,Yo G 8""“^. Let (io,jo,do) be the update variables 

^In the notation representation (1.1), this corresponds to choosing the same sequence {it, jt}t>o- 
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used by the random walk Xi in representation (1.1), so that Xi = F{io,jo,9o,Xo). Choose 
(fi G [0, 27r) uniformly at random among all angles that satisfy ^ 

Xi[io] = V'^o[*o]^ + ^obo]^cos(v9) 

^ibo] = + ^obo]^sin( 9 ?). 

Then choose 6 'q G [0, 27r) uniformly among the angles that satisfy 

F(*o, jo,6'o,h"o)bo] = \/Yo[io]^ + Tobo]^ cos( 9 ?) 

F(*o, jo, 6 'o,Tb)bo] = ^Fobo]^ + h"obo]^ sin((p) 
and set Yi = F(io, jo, 6 * 0 , F))). 

Remark 3.2. This coupling forces Yi to be as close as possible to Xi in the Euclidean 
distance. For example, in dimension n = 2, we always have Xi = Yi under this coupling. For 
Xq^Yq G this coupling forces the three points (0,0), (Xibo], ^ibo]) and (Yibo], Fjbo]) 
to be collinear; in particular, it forces Xibo]Fjbo], ^ibo]Fjbo] > 0 . 


We set some notation for the remainder of the paper. Whenever we consider a pair of 
copies of Kac’s walk {Xi}t>o, {Yt}t>o, we define 


/lib] = wb]^ B,[{\ = Ysr 


(3,1) 


for alH > 0 and all 1 < i < n. For x G M”', we define 

ii^iii = YI 

i i 

Finally, we define [n] = {1, 2,... ,n}. 


Lemma 3.3. Let Xq^Yq G S" Fort > 0, couple (Xt+i,lj+i) conditional on {Xt,Yt) 
according to the coupling in Definition 3.1. Then for any t >0, Kac’s walk on satisfies 


E|5^(AW-il,W)'l<2(l 

i=l 



(3.2) 


Proof. Fix Xq,Yq G For t > 0, couple (X^+i, Yj+i) conditional on {Xt,Yt) according to 

the coupling in Definition 3.1, with update variables it,jt, 9t, 9[ and additional variable as 
in that definition. We calculate 

n 

EE(AW-siW)t 

k=l 


n[n 


n{n — 1) 


- ElJ2(A,lk]-B,lk]Y\{i„,j„)^(,,j)] 

l<i<j<n k=l 

n 

2 Y^{A,[k\ - B,[k\f 


2 (n-l)(n- 2 ) 


k=l 


n[n 


— ^E[((klob] + ^dob']) cos{(pfi - (Rob] + ^obD cos((p) 

2 i<j 


2321 


^If Xo[io] = 2to[jo] = 0, all angles satisfy these equations. Otherwise, they have a unique solution, and 
the value of y — 0o modulo 27r does not depend on Oq. 
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n[n 


+^b1)sin(</?)^ - {Bo[i] + Bo[j]) sm{ipfy] 

' i<j 


n — 2 


n 


'^{Ao[k] - Bolk])"^ H- - -—E[cos(9?)"^] ^((Ao^] + Ao[j]) - (-Bob] + -^o[i])) 

^ n[n — 1 

k=l ' ' i<j 


(1 “ “) '^i^o[k] - -Bo[fc])^ + 2 ^('^ _ i) X^(("^ob] + "^ob'D - (-Bob] + -^ob'])) 

k=l ' i<j 




k=l 


+ E((^«W - BM)‘ + + 2(-4„[i] - B„[il)(-4„[j] - B„|j])) 


*<i 


(1-^)E(-4oW--Bc 


fc=l 


+ a]]]! -b E(2 >oW - Hk]? + E(-4oW - -BoW)(A„|j| - BoIjI) 

1 n o 

(1 - El^loW - -BoW)" + Et^loM - BoW)(A„|i| - Soli]) 

k=l ' i<j 

(1 - h f 2 <-Mk] - B«lk]f + „ b_,' ((E( 2 l.W - B„\k])f - f^iMk] - B( 


= (1-T^ 


k=l 


2n{n-l) ^ 


k=l 


2n 2n{n — 1) 
Thus we have 


3^) X]("4o[^] - -Be 


ElJ2{A^lk] - B,\k]Y] < {1 - 


k=l 


2n' 


(3,3) 


For t > 0, let Bt = cr(Uo<<i<t(^s, E^)) be the a-algebra generated by the random variables 
Xq, ... ,Xt and Yq, ..., Yf. Repeatedly applying Equation (3.3), we have for alH > 0 that 

n n 

- Bi[k\?\ = E|E[E(A W - B,W)'IA-ill 


k=l 


k=l 


< 


< 




2n 

1 


k=l 

n 


(1 - ^)EE[(-4oW - BoW)"] < 2(1 - -)‘. 


k=l 


2n 


□ 


This completes the proof. 
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Remark 3.4. We note that W can be recovered exactly from G [0,1]” x 

{—1,1}” = df by a map that we temporarily denote by G : A" i—)■ whose inverse 

exists except on a set of measure 0. Dehne the metric on A by d;,{{A,L),{A',L')) = 
||A — A '||2 + Inequality (3.2) can be combined with standard bounds on the 

contraction of simple random walk on the hypercube under the Hamming metric to show 
that, under the proportional coupling, 

E[d;,{G-\Xt+,),G-\Yt+,))\Xt,Yt\ < (1 - -)d;,{G-\Xt),G-\Yt)) 

n 

for some 0 < c < oo. This contraction estimate, combined with Proposition 30 of [01109], 
implies that the relaxation time of Kac’s walk is 0{n). As shown in [JanOl, CCL03, Mas03, 
CapOS], this is the correct order for the relaxation time, and our argument provides a short 
alternative proof of the main result in [JanOl]. However, we cannot use this approach to 
calculate the exact spectral gap, which was derived in [Mas03, CCL03, CapOS]. 

We close this section with two elementary bounds. For S' C {1, 2,..., n}, let = min{t : 
S C Uo<s<t{A, js} }• By the standard ‘coupon collector’ bound, 

Phn] >t]< ne~^ (3.4) 

for all f > 0. Let S' = {1 < i < n : Ao[i]Fo[*] < 0}. By Dehnition 3.1, we have 
mini<i<„Xt[i] Yt[i] > 0 for all t > Tg. 

Lemma 3.5. Let Y fi. Then for all 1 < c < oo and any 1 <i <n, 

P[A[i] < n"^'’] < 2n"^+^ 

Proof. Let Ci, • • •, Cn he n i.i.d. random variables with 1) distribution. Recall that the 
law C{Y[i\) of Y[i\ satishes C{Y[i]) = £( „„^ i ). By Markov’s inequality and some direct 

Z-(fc=l ’fc 

computation, 

P|y[i] < n-^] = P| S < n-^ ( 3 . 5 ) 

Z-^k=l 

n 

<p|cj<n-q + p|^cJ >«') 

k=l 

< n”'’ + < 2 n"'’+^ 


and the proof is hnished. □ 

4. Non-Markovian Coupling and Coalesence Estimates 

In this section, we construct a more complicated non-Markovian coupling of two chains 
{ht}t>o started at a pair of near by points. In the proof of Theorem 1, we will 
show that this coupling leads to a collision of W, L) with high probability for sufficiently 
large t. In order to couple the random walks {Xi}t>o, h is enough to couple the 

update sequences {{it{x)Gt{x),6t{x))}t>o, {{it{y)Gt{y),0t{y))}t>o used to update them in 
representation (1.1). Our coupling is as follows: 
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4.1. Non-Markovian Coupling. Fix Tq < T G N. We construct the coupled update 
sequences {{it{x), jt{x),6t{x))} and {{it{y),jt{y),Ot{y))} for To < t < T via the following 
procedure: 

(1) For each Tq < t < T, choose 1 < it{x) < jt{x) < n uniformly at random and set 
k{y) = h{x), jt{y) = jt{x). 

(2) Dehne a sequence of partitions {Vt\T-Tr. of {1, 2,..., n| inductively by the process: 

• Set the partition Vt = {{1}, {2},..., {n}}. 

• Write Vt+i = {Pi{t + 1),..., +1)} with Pk{t + 1) C {1,2, • • • ,n}. Let 1 < 

u(t),v(t) < it+i be the indices satisfying it{x) e Pu{t)(t + 1), jt{x) e Py(t)(t + 1). 
If u(t) = v(t), set Vt = Vt+i- If u{t) 7 ^ n(t), construct Vt by merging the sets 
Pu{t){t + l),Py(t)it + l). Thus, ifu{t) < v{t), set Vt = {Pi(t + 1),..., P„(t)_i(t + 
1), Pu{t) + lit + 1), . . . , P^(i)_l(t + 1), Py(t) + lit + 1), . . . , Pct+lit + 1), Pu(t)(t + 1) U 
Pv{t){t + 1)}. 

(3) If Vto = {!) 2,..., n}, continue the construction of the coupling by: 

• Dehne the ordered set 

S = {si,..., s„_i} = {To < t < T '■ Vt ^ Vt+i}- 

• For t E D {To, To + l,...,T — 1}, choose the update variables {9t{x), 9t{y)) 
according to the proportional coupling in Dehnition 3.1. 

• Let If denote the set of all joint distributions on [0, 27r) x [0, 27r) with both 
marginal distributions uniformly distributed on [0,27r). For t = Sk E S, choose 
^ i^s^{x),0s,{y)) ~ VT G If so as to maximize the probability of the following 
events: 

^ , 1 << f't+i (4.1) 

iSPr 

Xt+i[k]Yt+i[k]>0, k E {it{x),jt{x)}. (4.2) 

(4) If Vto 7^ {1,2,... ,n}, continue the construction of the coupling by setting 6t{y) = 
6t{x) for all Tq <t < T. 

Remark 4.1. If (4.1) holds at time T, (4.2) holds for all Tq < t < T, and Vtq = {1, 2,..., n}, 
then Xj- = Yt- Recall that forcing Xx and Yx to coalesce is the goal of our coupling 
construction. 

Remark 4.2. By construction, we have that {it{x), jt(,x)}xo<t<T, {h{y), jt{y)}To<t<T have 
the correct distributions. Going through the construction, it is easy to verify that the 
distribution of 0s{x) (respectively 0s{y)) conditional on {it{x),it{x)}xQ<t<T and {dt{,x)}xo<t<s 
(respectively {it{y), jt{y)}To<t<T and {0t{y)}xo<t<s) is uniform on [0,27r). Thus, Definition 
4.1 gives a valid coupling of two copies of Kac’s walk. 


^Note that there is generally not a unique coupling with this ‘maximal’ property. In our analysis, we do 
not care which coupling with this property is used. A reader concerned about this choice can instead use a 
coupling of the form implicitly constructed in the proof of Lemma 4.6 at this step, without changing any of 
the bounds in the rest of the paper. 



4.2. Bounds for coupling. The remainder of this section consists of some bounds that 
will allow us to prove that two chains {Xt}To<t<T, {ht}ro<t<T constructed according to the 
coupling described in Section (4.1) satisfy the conditions in Equations (4.1) and (4.2) for all 
Tq < t < T with high probability, as long as T — Tq is sufficiently large and — EtoH is 
sufficiently small. 

We hrst bound the probability that the condition in step 3 of our coupling, Vtq = 
{1,2 ,..., n}, fails to hold: 

Lemma 4.3 (Splitting of Partitions). Fix e > 0, Tq G N and T > Tq + (| + 2e)nlog(n). 
Then for Vtq as in Section f.l and all n > iVo(e) sufficiently large, 

P[Pro = {l,2,...,u}]>l-2u-h 

Proof. This follows immediately from Proposition 7.3 of [BolOl]. Fix M > 0 and dehne the 
Erdos-Renyi graph Gm on n vertices with exactly M (possibly repeated) edges 

E{Gm) — ^s=T-MiGGs)- 

Then 

P[Pto = (1) 2 ,..., n}] = P[Gt-Tq is connected] > 1 — 2n~^ 
where the last inequality follows from Proposition 7.3 of [BolOl] . □ 

For Tq < s < T, dehne the event .4,(s) by 

.4,(s) = {Equations (4.1) and (4.2) are satished for all Tq <t < s.}. (4.3) 

For S' C {1, 2,..., n} and X e M"', dehne 

l|V||i,s = ^|V|i]|. 

ies 

Also recall the dehnition of At, Bt in Equation (3.1). The following bound implies that good 
couplings will never drift too far apart: 

Lemma 4.4 (Closeness of Good Couplings). FixTo < T, and couple two chains {Xt}To<t<T, 
{Yt}To<t<T according to the non-Markovian coupling defined in Section 4-T FIxTq < s <T. 
Then, on the event .4,(s) fl {Vtq = {1, 2, • • • , n}}, we have 

\\At — Bt\\i,s Y \\Ato — Bto\\i (4.4) 

for all Tq < t < s and all S ^Vt- Furthermore, for all Tq < t < s, 

\\At — Bt\\i < n\\ATo — Bto\\i- (4.5) 

Remark 4.5. The hnal bound of Lemma 4.4, Inequality (4.5), is quite weak: it allows 
the distance \\At — Bt\\i to increase by as much as a factor of n. However, this additional 
factor of n has a very small impact on our hnal bound in Theorem 1; indeed, changing this 
multiplicative factor of n to a factor of would change our bound on the mixing time only 
by an additive factor of 2kn\og{n). 

We also point out that, a priori, it is not obvious that any polynomial bound holds. 
Indeed, the distance between At and Bt can double over a single step. Thus, naively, one 
obtains only a much larger bound, on the order of n"'. 

9 


Proof of Lemma 4-4- We prove (4.4) by induction on t. It is trivial at time t = Tq. Now 
assume that it holds at time t. We will show it also holds at time t + 1 by considering two 
cases seperately: 

(1) t ^ S: In this case, it = it+i and Pk{t) = Pk{t + 1) for all 1 < /c < it- For all k ^ u{t), 
it is immediate that At+i[m] = Aj[m] and i?t+i[m] = Bt[m] for all m G Pk{t), so by 
the induction hypothesis 

IIA+i - Bt+i\\i^Pi^[t+i) = \\At - Bt\\i^p^[t} < \\Ato - BpoWi- 
For the remaining index k = u{t), we calculate 

W^t+i-B t+i II 

\Xt+i[m]^ -Yt+i[m]'^\ 


”^SP„(t)(i+l) 











\Xt[mr- 

Xtl 

mr\ 

+ 

lW+l[^t] 

2 Y 
~ ^t+1 

m + \xtpi[jtr- 

ytpim 











= E 

\Xt[m]^- 

Ytl 

mr\ 

+ 


+ VW 

cos((p)2 ■ 

- mtf 

+ yt[itf)cos{ipf\ 

mePu(t)it)\W,jt} 











+ \ iXt[ity 

* + 

Xt[j 

r. 

) sm{^Y 

- inlit] 

^ + ytm 

sin(<p)2| 


= E 

\Xt[m]^- 


[mr\ 

+ 

lixt^tr 

+ .VW 

- mt] 


^)\ 

mePu(t){k\W,jt} 










W 

VI 

\Xt[m]^ - 


[m]^ 

+ 

\xt[^tr - 


+ \xt[jtr 

-yt[3tf 


meP„(t)(q\{it,it} 










- \[At - < II^To - 

■ Btq 

111: 

) 






where the last step is by the induction hypothesis. This completes the proof for the 
case t ^ S. 

(2) t = Sk E S: By the same argument as in the previous case, we need only check that 
ll^i+i ||i,p„( 4 )(t+i) ^ ll^t IIand ||y4j_|_i ^ 

IIA —-Bt||i,p„(j)p+i)up.„(t)(t+i); the other k — 1 expressions in equality (4.1) are satished 
automatically. By the symmetry of the problem, it is sufficient to check the hrst of 
these two inequalities. We compute 

II A+i —-Bt+i||i,p,,(j)(i+i) = ^ \At+i[m] — Btpi[m]\ 

= ^ \At+i [m] - Btpi [m] I + I At+i [it] - Bt+i [it] \ 

mePu(t)(t+i)\{jt} 

= ^ \At[m]- Bt[m] \ + \At[it] + At[jt] - Bt+i[it] - Bt+i[jt]\cos{(py 

< \Mr^] - Bt[m] \ < \\At - Bt\\i,p^^^^^t+i)uP^^,^{t+i) 

"ieP„(t)(t+i)u{it} 

< II^To - -Bpolli, 
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where the last step is by the induction hypothesis. 

Thus, in either case, Equation (4.4) holds at time t + 1, proving our hrst claim. 

Equation (4.5) follows by calculating 

It 

\\At - Bt\\i = ^ \\At - Bt\\i,p^{t) 

k=l 

it 

< X] II^To - BtqWi 

k=l 

= ^tWApo — BpgWi < — i?Tolli) 

where the hrst inequality follows from Equation (4.4), and we are done. □ 

The following Lemma will be used to show that it is possible to couple Xt+i, Yt+i so as to 
satisfy Equations (4.1) and (4.2), as long as Xt,Yt satisfy those equations and have certain 
other good properties. 

Lemma 4.6 (Coupling Success). Fix positive reals 1 < p < g' < |. Let 9,9' ~ Unif[0,27r) 
and let S = A + B cos(6*)^ and S' = C + D cos(6*)^ for some 0 < A, B,C, D < 1 that satisfy 
\A — C\,\B — D\ < and B,D > n~P. Then for n > Nq{p, q, q') sufficiently large, there 
exists a coupling of 9, 9' so that 

p[^ = ^'] > 1 - 6 X (4.6) 

and 

cos{9)cos{9') >0, sin(6') sin(6'') > 0, (4.7) 

where c = min(|-, q — 2q') > 0. 

Remark 4.7. In the notation of the construction in Section 4.1, we will use Lemma 4.6 with 
^ ^ + Xt[jtf). Roughly speaking, the upper bounds 

on \A — C\,\B — D\ in the statement of the lemma represent upper bounds on the distances 
between the medians and densities of two distributions that we wish to couple, and the lower 
bounds on B, D represent upper bounds on how quickly the densities change. By inspection 
of Inequality (4.8) below, we expect to be able to couple two random variables with high 
probability if the distance between their medians and densities is small compared to the 
derivative of the density. 

Note that we need both bounds to obtain the desired coupling inequality; as illustrated 
by the densities h is not enough to just have 

a bound on the distance between the medians. 


Proof of Lemma 4-6. We begin by showing that it is possible to satisfy inequality (4.6). 
Recall the standard inequality that, for any distributions ui, V 2 with densities /i, /2 on M, 

min(/i(x),/2(x))cia;. (4.8) 

The random variables S, S' have densities 


1^1 ~ ^2||tv ^ 1 ~ 


fs{.x) 


1 1 

2vr -A{x- A-B/2Y' 

11 


xe{A,A + B) 






9s'{x) = 


2vr - 4 (x-C-D/2)2’ 


x e {C,C + D). 


Thus, applying Inequality (4.8) to these two densities with 1 < p < g' < | and n > No{p, q, q') 
sufficiently large, 

^l-f 


' {A,A+B)n{C,C+D) 


mm{fs{x),gs'{x))dx 


< 1 - 


27r 


liA+n-i',A+B-2n-i') \/— A{x — A — 5/2)2 


X 


mm 


< 1 - 


. y/(-D - n-9)2 - 4(|x - C - B/2j + 2n-5)2 

^ ’ VD2 _ 4(a; - C - D/2)2 ^ "" 

1 


2 ^ J(A+n-’i' ,A+B-2n-‘t') B'^ — 4(x — A — 5/2)^ 


X 


(1 - 100 


n 


-9 


-A{x-C - D/2y 


)^dx 


n 


-<? 


< 1 - (1 - 8 X 10^ — , 

n-29 27r ^52 - 4(a; - A - 5/2)2 


n 


-<? 


1 


< 1 - (1 - 8 X 10^—tt)2(1 - — / , _ 

^ 27r y(A,A+2n-9')U(A+S-2n-'3',A+B) \/B"^ — 4(x — ^4 — 5/2)^ 


dx) 


n 


-<? 


< 1 - (1 - 8 X 10^^^)5(1 - ) 


n“2'?' ■ TT 


< 5 X ^ H —n 2'?/ 


TT 


This implicitly dehnes a coupling of S', S' that satishes P[S' = S"] > 1 — 5 x 10^77.2'^' + 

^n“2'?' > 1 — 6 X 10^ X 77.“''^ for all sufficiently large n. 

Once we have a coupling of S', S" satisfying (4.6), we can extend it to a coupling of 6 , 6 ' 
that satishes (4.7) via a suitable rotation. To be more precise, dehne the map G : [0, 27r) t-A- 
[0,l]x{-l,l}x{-l,l}by 

cos((/)) sin((/)) 


G{^) ^ (Gi(7/), 02(7/), 03(7/)) = (cos2(0). 


cos((/)| ’ I sin((/))| 


If d ~ Unif[0,27r), then Gi(d), G 2 { 6 ) and G3(d) are independent, with P[G2(d) = 1] = 
P[G3(d) = 1] = |. Furthermore, S depends only on Gi(d) and S' depends only on Gi(d'). 
Thus, when extending a coupling of S', S' to a coupling of d, 6 ' we are free to choose any 
coupling of G2(d),G2(d') and any coupling of G3(d),G3(d'). We choose the coupling for 
which 

G,(d) = G,(d') 

for i e {2,3}. Thus, our coupling of 6 , 6 ' automatically satishes both inequality (4.6) and 
(4.7) whenever S', S" satisfy inequality (4.6). □ 
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5. Proof of Theorem 1 


We begin by proving the lower bonnd of Theorem 1, Equation (1.2). Fix Ci < 1/2 and let 
{Ti(? 7,)}„>2 satisfy Ti{n) < Cin\og{n). Fix Xq = x E let {{it,it)0t)}t>o be the update 
sequence associated with {W}t>o and dehne 

C = min{f > 0 : Uo<5<t{is, = {1,2,... ,n}}. 

¥oit<C,XtEA = e : X[i] = x[z]}, and n{A) = 0. Thus, 

||'^^(^Ti(n)) — /^||tV > P[^T(n) E A] — fi{A) 

>P[C>Ti(n)]-0. (5.1) 

By the standard coupon collector problem (see, e.g., [ER61]), 

lim P[(/ > Ti(n)] = 1. (5.2) 

n^oo 

Combining inequality (5.1) with (5.2) completes the proof of Equation (1.2). 

Let a = 47, b = 18.1. To prove inequality (1.3), we £x sequences T 2 {n), Tl^in) and T^'^n) 
that satisfy 

200nlog(?7,) < T 2 (n) < n^, 

and T^^n) > (4a + 5)nlog(n), Tf^in) > ?T,log(n) with 

T'(n) + r''(n)=T 2 (n) 

for all n sufficiently large. We then construct a coupling of two copies {Xi}i>o, {Li}t>o of 
Kac’s walk on the sphere with starting points Xq = x E and jC(Yq) = fi. The coupling 
is as follows: 

(1) Couple {W}o<t<T'(n), {b/}o<t<T'(n) by using, at each step, the proportional coupling 
as in Dehnition 3.1. 

(2) Conditional on {W}o<t<T^(n), {b)}o<t<T^(n), couple {W}T,^(n)<t<T 2 (n), {Xt}T^{n)<t<T 2 {n) 
according to the non-Markovian coupling constructed in Section 4.1. Thus in the 
notation of Section 4.1, we have Tq = T 2 {n) and T = T 2 (n). 

(3) Conditional on {Xt}o<t<T 2 in), {'^t}o<t<T 2 {n), we couple the remaining steps of the 

two chains as follows. First, run {Xt}t>T 2 {n) conditional on {Xt}o<t<T 2 {n) according 
to its distribution. If XT 2 (n) = kr 2 (n), set Yt = Xt for all t > Un). If ^ 

YT 2 {n), run V^t}t>T 2 {n) conditional on {Tt}o<t<T 2 (n) according to its distribution and 
independently of {W}t>T2(n).'^ 

Dehne the events 

= {II^T'(n) “ >■«,“}□ {mm^Xrp^(^n)[i]'^mn)[i] < 0 } 

S 2 = {XT 2 {n) 7^ ^T2(n)} 

^3 = {^T'(n) 7^ {1, 2,..., n}}. 

By Lemma 2.1, 

sup ||£(XT 2 (n)) - hlkv < sup ¥[ 82 ] 

Xo=a:eS'*-l Xo=a:eS'*-l 

^The details of this third step of the coupling are irrelevant to the following analysis. They are provided 
only to give a concrete coupling that satisfies the condition P[Vt > t{x), Xt = Yt] = 1 in the statement of 
Lemma 2.1. 
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< sup (P[£i] + P[£ 3 ] + P[^ 2 n£i"n£ 3 "]). (5.3) 

Xo=xeS"-i 


The remainder of the proof of inequality (1-3) will consist of bounding these three terms, in 
order. 

Applying inequality (3.4) and then Lemma 3.3 and Markov’s inequality, 

lim sup P[^^i] (5.4) 

"^°°Xo=a:eS"-l 

< lim sup (P[||Ar^(„) - 5r^(n)||i > + P[min [i]Tr^(„) [i] < 0]) 

1 TUn) 

< lim sup (P[||Ar^(„) - 5r^(„)||2 > n “ +ne - ) 


< lim (n^“+^(l - —+ n-^) = 0. 

n->-oo 2n 


By Lemma 4.3, 


lim sup P[£’ 3 ] = 0. (5.5) 

"'^°°Xo=xeS'"-i 

Recall the event M(s) from Equation (4.3): 

M(s) = {Equations (4.1) and (4.2) are satisfied for all T 2 (u) <t< s.}. 


Our next goal is, roughly, to show that conditional on the first phase of the coupling having 
gone according to plan, the set M(s + l)\M(s) has small probability; this is stated formally 
in Inequality (5.10). 

For T^in) < s <T 2 {n), define 

B(s) = { min min 
^ ^T'(n)<t<s l<i<n 2 ^ 

We will bound P[T 2 O O T|] by showing that F[A{t + 1)'’ O A{t) 0 B{t) fl O £f\ is small 
for all T 2 {n) <t< T 2 {n). 

To this end, hrst note that 

P[M(T'(n))‘’n£3^] =0. 

Recall the set S from the construction of non-Markovian coupling from Section 4.1. We 
now consider two cases: t & S and t G {T 2 (n),..., T 2 (n)}\iS. In the case t G {T 2 (n),..., T 2 {n)}\S, 
we have 

At[l] = At+i[/], Pr{t + 1) = Pr{t) G Pt, ^ ^ f ^ ^t- 

lSPr{t) 

This implies that 

F[A{t +1)" n A{t) n B{t) n n S^] < F[A{t +1)'’ n A{t) n Sf\ = o. (5.6) 

Next, consider t = Sk E S. We will apply Lemma 4.6 with the random variables: 

^ + Xsdjsf) COs{9s,fi 

»'6A.(sfc)(sfc+i)\Rfc} 
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(5.8) 


On the event of ^(sfc) O O Equation (4.5) of Lemma 4.4 gives that 


n 


1—a 


- BsJ\l < n\\AT^(^ri) - -Bt^hIIi < 

and so on the event A{sk) O B{sk) DS^ ClS^ we also have 

min min minfX+hl^, > n~^. 

mn)<t<s l<i<n ^ ^ ^ ^ 

Thus, on the event A{sk) O B{sk) £^, the random variables S, S' given in Equation 

(5.7) satisfy the assumptions of Lemma 4.6, with p = b, q = a — 1 and q' = ^ ^^). 

Thus, by Lemma 4.6, 


P 


{ 5: E [if] n A{sk) n B{sk) n n 


< 6 X lO^n s . 

By the constraint Yf!i=i -^t+i[i] = Yf!i=i Bt+i[i] = 1, we have 

{ E E r„+i[y}n.4(st) = .4(s» + i)"n.4(sj. 


2(q-1) 


(5.9) 






Combining this with inequality (5.9) gives 

P[>f(sfc + If nA{sk)AB{sk)A8 l'A8l] <6 X (5.10) 

By Lemma 3.5 and a union bound over times T^ln) < t < T 2 {n) and indices 1 < f < n, 
we have for all sufficiently large n, 

P[i3(T2(n))‘=] < 3n^-L (5.11) 

Combining inequalities (5.10) and (5.6) and then (5.11), we have 

¥[82 n 81 n 8f\ = P[^(T2(n))" n 81 n 8^] 

T2{n) 

< P[^(T2(n)-s + l)"n^(T2(n)-s)n£i"n^:3"]+P[^(T'(n))"n£i"] 

s=T2(n)-mn) 

T 2 {n) 

< Y (P[^(^ 2 (n) - s + 1)" n ^(T 2 (n) - s) n B{T 2 {n) - s) O 8 { O 8 f\ 

s=T2(n)-mn) 

+ P[B(T 2 (n) - s)'^]) + ¥[A{T!,{n)y O 8 f\ 


< 6 X 10^ T 2 (n) n ^ 5 ^ + T 2 (n) 3 + 0 

Q r) 2(<l — 1) ^ b 

< 6 X 10^ 5 + n® 3 . 

Since 5 > 18 and a > 6, this implies 

lim sup P [£’2 n £^3 n £^ 3 ] = 0 . (5.12) 

"■^°°Xo=3;eS"-l 

Inequality (1.3) follows immediately from inequalities (5.3), (5.4), (5.5) and (5.12). For 
sequences with T 2 {n) < for all n sufficiently large, this completes the proof of inequality 
(1.3). For sequences with T 2 {n) > infinitely often, inequality (1.3) follows from the case 

15 









T{n) = by the fact that the total variation distance to stationarity of a Markov chain is 
monotonely decreasing in time. This completes the proof of Theorem 1. 
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